SPACEc: Clustering¶
After preprocessing the single cell data, the next step is to assign cell types. One of the most common approaches to identify cell types is unsupervised or semi-unsupervised clustering. SPACEc utilizes the widely used scanpy library or pyFlowSOM to carry out this task. The user can specify different clustering resolutions as well as the number of nearest neighbors to modify the number of identified clusters. The flexible design of SPACEc allows for the selection of unique clustering strategies, dependent on the research question and available dataset.
If you work with very large datasets consider using the GPU accelerated leiden clustering. Check our GitHub page for installation instructions.
This notebook utilizes the scanpy library for clustering and visualization.
# import spacec first
import spacec as sp
#import standard packages
import os
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt
# silencing warnings
import warnings
warnings.filterwarnings('ignore')
plt.rc('axes', grid=False) # remove gridlines
sc.settings.set_figure_params(dpi=80, facecolor='white') # set dpi and background color for scanpy figures
/miniforge/envs/spacec/lib/python3.10/site-packages/louvain/__init__.py:54: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. from pkg_resources import get_distribution, DistributionNotFound 2026-02-02 00:03:49.722018: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /.singularity.d/libs 2026-02-02 00:03:49.722055: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
# Specify the path to the data
root_path = "/home/jiawu2/SPACEc_image" # inset your own path
data_path = os.path.join(root_path, 'data/') # where the data is stored
# where you want to store the output
output_dir = os.path.join(root_path, 'results/')
os.makedirs(output_dir, exist_ok=True)
# Loading the denoise/filtered anndata from notebook 2
adata = sc.read(output_dir + 'adata_nn_demo.h5ad')
adata # check the adata
AnnData object with n_obs × n_vars = 50037 × 58
obs: 'label', 'cell_id', 'DAPI', 'x', 'y', 'area', 'region_num', 'unique_region', 'condition'
Clustering¶
By setting a clustering seed you can ensure that your PC is always performing clustering in the same way. This is important if you want to change or correct things later on.
clustering_random_seed = 0
Before you start to annotate your cells try to develop a clustering strategy. Common approaches include to start with a coarse annotation such as immune cell, tumor cell, etc. and then refine the clusters. Another common strategy is to overcluster your dataset and then remerge split populations. Depending on your dataset you will often find yourself to use a mixed approach. Best practice is to start clustering with a set of markers that best describes your cell types. Functional markers such as PD1 should therefore be used later if you refine your clusters. In this simple example we will start with a fairly large collection of markers and employ several rounds of subclustering to improve the results over multiple iterations.
# This step can be long if you have large phenocycler images
# Use this cell-type specific markers for cell type annotation
marker_list = [
'FoxP3', 'HLA-DR', 'EGFR', 'CD206', 'BCL2', 'panCK', 'CD11b', 'CD56', 'CD163', 'CD21', 'CD8',
'Vimentin', 'CCR7', 'CD57', 'CD34', 'CD31', 'CXCR5', 'CD3', 'CD38', 'LAG3', 'CD25', 'CD16', 'CLEC9A', 'CD11c',
'CD68', 'aSMA', 'CD20', 'CD4','Podoplanin', 'CD15', 'betaCatenin', 'PAX5',
'MCT', 'CD138', 'GranzymeB', 'IDO-1', 'CD45', 'CollagenIV', 'Arginase-1']
# clustering
adata = sp.tl.clustering(
adata,
clustering='leiden', # can choose between leiden and louvian
n_neighbors=10, # number of neighbors for the knn graph
resolution = 0.5, #clustering resolution (higher resolution gives more clusters)
reclustering = False, # if true, no computing the neighbors
marker_list = marker_list, #if it is None, all variable names are used for clustering
seed=clustering_random_seed, # random seed for clustering - reproducibility
)
Computing neighbors and UMAP - neighbors - UMAP Clustering Leiden clustering
Visualizing your results as UMAP scatter plot helps to identify batch effects and to estimate how well clusters are separated. What we want to see is poor separation between the regions (left) and good separation between the clusters (right).
# visualization of clustering with UMAP
sc.pl.umap(adata, color = ['leiden_0.5', 'unique_region'], wspace=0.5)
This plot shows the marker expression profile per cluster and helps to identify clusters that need subclustering. Subclustering splits a cluster into a number of subclusters, to enhance clustering resolution for this specific subset of cells.
sc.pl.dotplot(adata,
marker_list, # The list of markers to show on the x-axis
'leiden_0.5', # The cluster column
dendrogram = True) # Show the dendrogram
WARNING: dendrogram data not found (using key=dendrogram_leiden_0.5). Running `sc.tl.dendrogram` with default parameters. For fine tuning it is recommended to run `sc.tl.dendrogram` independently.
Subclustering round 1¶
# subclustering cluster 0, 3, 4 sequentially (could be optional for your own data)
sc.tl.leiden(adata,
seed=clustering_random_seed, # random seed for clustering - reproducibility
restrict_to=('leiden_0.5',['15']), # select the cluster column name (your previously generated key) and the cluster name you want to subcluster
resolution=0.1, # resolution for subclustering
key_added='leiden_0.5_subcluster') # key added to adata.obs (keep it the same to avoid confusion and limit the adata object size)
# repeat the same for other clusters you want to subcluster
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['1']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['3']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['7']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['11']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['12']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['14']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['2']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['4']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['5']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['6']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['13']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['10']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['8']), resolution=0.1, key_added='leiden_0.5_subcluster')
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('leiden_0.5_subcluster',['9']), resolution=0.1, key_added='leiden_0.5_subcluster')
# Visualize cluster expression profiles
sc.pl.dotplot(adata,
marker_list,
'leiden_0.5_subcluster', # The cluster column (now use the subcluster column)
dendrogram = False)
Once you feel ready for the first round of annotation you can generate a dictionary to rename each cluster with an according biological name. Be aware that dense regions sometimes lead to spillover. This spillover can only be corrected to a certain degree and often leads to cells being slightly positive for the markers of neighboring cells. The best practice for precise annotation is to inspect the spatial position of the annotated cells. This can either be done through the catplot function or via the TissUUmaps module.
If you are not sure about a cluster and need further subclustering to resolve mixed populations give these clusters a placeholder name such as recluster.
# tentative annotation based on the marker
cluster_to_ct_dict = {
'0': 'noise',
'2,0': 'noise',
'2,1': 'noise',
'1,0': 'B cell CD20+CD21+',
'3,0': 'B cell CD20+CXCR5+',
'3,1': 'unknown',
'4,0': 'B cell CD20+CD21+',
'4,1': 'B cell CD20+CXCR5+',
'5,0': 'Epithelial cell EGFR+betaCatenin+CD138+',
'5,1': 'Epithelial cell EGFR+betaCatenin+CD138+',
'6,0': 'CD8 T cell',
'7,0': 'Treg CCR7+',
'7,1': 'Treg IDO-1+',
'8,0': 'M1 Macrophage CD11c+CD68+',
'8,1': 'M2 Macrophage CD11B+CD163+',
'9,0': 'unknown',
'9,1': 'unknown',
'10,0': 'Endothelial cell CD34+CD31+',
'10,1': 'Neutrophil',
'10,2': 'NK cell',
'10,3': 'vessel aSMA+',
'11,0': 'Treg',
'11,1': 'Treg',
'12,0': 'M2 Macrophage CD206+',
'12,1': 'M2 Macrophage CD206+',
'13,0': 'DC',
'13,1': 'DC',
'14,0': 'unknown',
'14,1': 'unknown',
'14,2': 'unknown',
'15,0': 'MCT+',
'15,1': 'MCT+',
}
# This allows us to generate a new column named cell_type_coarse based on the leiden_1_subcluster column
adata.obs['cell_type_coarse'] = ( # create a new column
adata.obs['leiden_0.5_subcluster'] # get the cluster names
.map(cluster_to_ct_dict) # map the cluster names to cell types
.astype('category') # convert to category
)
First QC¶
After the first round of annotation you should check your results.
- Make sure that each cell type expresses the correct markers.
- Check the spatial position of cell types (consider speaking to a domain expert if you are unsure about the tissue)
- Check the frequencies of cells - do these numbers fit with the biology of your sample?
Try to take your time and evaluate each step carefully to achieve the best results.
# Check the marker expression of the annotated cell types
sc.pl.dotplot(adata, marker_list, 'cell_type_coarse', dendrogram = False)
sp.pl.catplot(
adata,
color = "cell_type_coarse", # specify group column name here (e.g. celltype_fine)
unique_region = "condition", # specify unique_regions here
X='x', Y='y', # specify x and y columns here
n_columns=2, # adjust the number of columns for plotting here (how many plots do you want in one row?)
palette='tab20', #default is None which means the color comes from the anndata.uns that matches the UMAP
savefig=False, # save figure as pdf
output_fname = "", # change it to file name you prefer when saving the figure
output_dir=output_dir, # specify output directory here (if savefig=True)
figsize= 17, # specify the figure size here
size = 20) # specify the size of the points
| x | y | cell_type_coarse | condition | |
|---|---|---|---|---|
| 0 | 1322.675214 | 5.252137 | noise | tonsillitis |
| 1 | 1472.197452 | 5.356688 | Neutrophil | tonsillitis |
| 2 | 1505.800000 | 5.072727 | Neutrophil | tonsillitis |
| 3 | 641.724832 | 8.741611 | noise | tonsillitis |
| 4 | 1304.100000 | 9.300000 | noise | tonsillitis |
| ... | ... | ... | ... | ... |
| 22255 | 1456.914062 | 2521.546875 | Epithelial cell EGFR+betaCatenin+CD138+ | tonsillitis |
| 22256 | 442.215339 | 2522.433628 | noise | tonsillitis |
| 22257 | 1438.561644 | 2522.406393 | noise | tonsillitis |
| 22258 | 1383.661972 | 2523.711268 | Epithelial cell EGFR+betaCatenin+CD138+ | tonsillitis |
| 22259 | 1420.271739 | 2524.836957 | Epithelial cell EGFR+betaCatenin+CD138+ | tonsillitis |
22260 rows × 4 columns
# print the frequencies of cell types
adata.obs['cell_type_coarse'].value_counts()
cell_type_coarse noise 15992 B cell CD20+CD21+ 8259 B cell CD20+CXCR5+ 5751 unknown 3138 CD8 T cell 2707 Epithelial cell EGFR+betaCatenin+CD138+ 2707 M1 Macrophage CD11c+CD68+ 2480 Endothelial cell CD34+CD31+ 1790 Treg 1570 M2 Macrophage CD206+ 1387 Treg CCR7+ 1379 Treg IDO-1+ 1288 DC 896 MCT+ 247 Neutrophil 180 NK cell 150 vessel aSMA+ 80 M2 Macrophage CD11B+CD163+ 36 Name: count, dtype: int64
subclustering round 2¶
Repeat the previously conducted procedure. It might be necessary to do this multiple times, dependent on the size and complexity of your dataset as well as your staining quality.
sc.tl.leiden(adata,
seed=clustering_random_seed,
restrict_to=('cell_type_coarse',['unknown']), # select the cluster column name (your previously generated key) and the cluster name you want to subcluster
resolution=1.0,
key_added='cell_type_coarse_subcluster') # new column added to adata.obs
sc.pl.dotplot(adata, marker_list, 'cell_type_coarse_subcluster', dendrogram = False)
# tentative annotation based on the marker
cluster_to_ct_dict = {
'unknown,0': 'Plasma cell',
'unknown,1': 'recluster',
'unknown,2': 'Plasma cell',
'unknown,3': 'recluster',
'unknown,4': 'Plasma cell',
'unknown,5': 'Plasma cell',
'unknown,6': 'Plasma cell',
'unknown,7': 'recluster',
'unknown,8': 'Plasma cell',
'unknown,9': 'Plasma cell',
'unknown,10': 'Plasma cell',
'unknown,11': 'Plasma cell',
'unknown,12': 'recluster',
'unknown,13': 'recluster',
'unknown,14': 'recluster',
'unknown,15': 'Plasma cell',
'B cell CD20+CD21+': 'B cell CD20+CD21+',
'B cell CD20+CXCR5+': 'B cell CD20+CXCR5+',
'CD8 T cell': 'CD8 T cell',
'Epithelial cell EGFR+betaCatenin+CD138+': 'Epithelial cell EGFR+betaCatenin+CD138+',
'M1 Macrophage CD11c+CD68+': 'M1 Macrophage CD11c+CD68+',
'Endothelial cell CD34+CD31+': 'Endothelial cell CD34+CD31+',
'Treg': 'Treg',
'M2 Macrophage CD206+': 'M2 Macrophage CD206+',
'Treg CCR7+': 'Treg CCR7+',
'Treg IDO-1+': 'Treg IDO-1+',
'DC': 'DC',
'Neutrophil': 'Neutrophil',
'NK cell': 'NK cell',
'vessel aSMA+': 'vessel aSMA+',
'M2 Macrophage CD11B+CD163+': 'M2 Macrophage CD11B+CD163+',
'noise': 'noise',
'MCT+': 'MCT+',
}
adata.obs['cell_type_coarse_f'] = (
adata.obs['cell_type_coarse_subcluster']
.map(cluster_to_ct_dict)
.astype('category')
)
sc.pl.dotplot(adata, marker_list, 'cell_type_coarse_f', dendrogram = False)
# print the frequencies of cell types
adata.obs['cell_type_coarse_f'].value_counts()
cell_type_coarse_f noise 15992 B cell CD20+CD21+ 8259 B cell CD20+CXCR5+ 5751 CD8 T cell 2707 Epithelial cell EGFR+betaCatenin+CD138+ 2707 M1 Macrophage CD11c+CD68+ 2480 Plasma cell 2130 Endothelial cell CD34+CD31+ 1790 Treg 1570 M2 Macrophage CD206+ 1387 Treg CCR7+ 1379 Treg IDO-1+ 1288 recluster 1008 DC 896 MCT+ 247 Neutrophil 180 NK cell 150 vessel aSMA+ 80 M2 Macrophage CD11B+CD163+ 36 Name: count, dtype: int64
If you encounter a cell population that seems to be impossible to annotate you can carefully check if your cells resemble noise or a segmentation artefact. In our example dataset, we encountered an edge effect during segmentation. Therefore, it is save to remove the cells labeled as noise. Please evaluate every case carefully, never drop cells if you are not sure that these are picked up by mistake.
# remove noise
adata = adata[~adata.obs['cell_type_coarse_f'].isin(['noise'])]
subclustering round 3¶
Repeat the previous steps...
sc.tl.leiden(adata, seed=clustering_random_seed, restrict_to=('cell_type_coarse_f',['recluster']), resolution=0.5, key_added='cell_type_coarse_f_subcluster')
sc.pl.dotplot(adata, marker_list, 'cell_type_coarse_f_subcluster', dendrogram = False)
Scaling your data can help to boost contrast and allows to decide for difficult to annotate clusters.
# scale and store results in layer
adata.layers["scaled"] = sc.pp.scale(adata, copy=True).X
sc.pl.matrixplot(
adata,
marker_list,
"cell_type_coarse_f_subcluster",
dendrogram=False,
colorbar_title="mean z-score",
layer="scaled",
vmin=-2,
vmax=2,
cmap="RdBu_r",
)
# tentative annotation based on the marker
cluster_to_ct_dict = {
'B cell CD20+CD21+': 'B cell CD20+CD21+',
'B cell CD20+CXCR5+': 'B cell CD20+CXCR5+',
'CD8 T cell': 'CD8 T cell',
'Epithelial cell EGFR+betaCatenin+CD138+': 'Epithelial cell EGFR+betaCatenin+CD138+',
'M1 Macrophage CD11c+CD68+': 'M1 Macrophage CD11c+CD68+',
'Endothelial cell CD34+CD31+': 'Endothelial cell CD34+CD31+',
'Treg': 'Treg',
'M2 Macrophage CD206+': 'M2 Macrophage CD206+',
'Treg CCR7+': 'Treg CCR7+',
'Treg IDO-1+': 'Treg IDO-1+',
'DC': 'DC',
'Neutrophil': 'Neutrophil',
'NK cell': 'NK cell',
'vessel aSMA+': 'vessel aSMA+',
'M2 Macrophage CD11B+CD163+': 'M2 Macrophage CD11B+CD163+',
'noise': 'noise',
'Plasma cell': 'Plasma cell',
'MCT+': 'MCT+',
'recluster,3': 'NK cell',
'recluster,0': 'CLEC9A+IDO-1+',
'recluster,1': 'CLEC9A+IDO-1+',
'recluster,2': 'CLEC9A+IDO-1+',
'recluster,6': 'Treg',
'recluster,4': 'noise',
'recluster,5': 'noise',
}
adata.obs['cell_type'] = (
adata.obs['cell_type_coarse_f_subcluster']
.map(cluster_to_ct_dict)
.astype('category')
)
# drop noise
adata = adata[~adata.obs['cell_type'].isin(['noise'])]
Final QC¶
As mentioned previously, careful reevaluation is the key for cell annotation. Before saving your data check the annotation one more time.
ax = sc.pl.heatmap(
adata,
marker_list,
groupby="cell_type",
layer="scaled",
vmin=-2,
vmax=2,
cmap="RdBu_r",
dendrogram=False,
swap_axes=True,
figsize=(40, 10),
)
# store the annotated adata
adata.write(output_dir + "adata_nn_demo_annotated.h5ad")
Single-cell visualzation¶
# list of cell types
adata.obs['cell_type'].value_counts()
cell_type B cell CD20+CD21+ 8259 B cell CD20+CXCR5+ 5751 CD8 T cell 2707 Epithelial cell EGFR+betaCatenin+CD138+ 2707 M1 Macrophage CD11c+CD68+ 2480 Plasma cell 2130 Endothelial cell CD34+CD31+ 1790 Treg 1603 M2 Macrophage CD206+ 1387 Treg CCR7+ 1379 Treg IDO-1+ 1288 DC 896 CLEC9A+IDO-1+ 731 MCT+ 247 NK cell 247 Neutrophil 180 vessel aSMA+ 80 M2 Macrophage CD11B+CD163+ 36 Name: count, dtype: int64
import matplotlib.pyplot as plt
from matplotlib.colors import to_hex
#make sure cell_type is categorical (Scanpy uses category order)
adata.obs["cell_type"] = adata.obs["cell_type"].astype("category")
cell_types = list(adata.obs["cell_type"].cat.categories)
# build a large distinct color pool
def make_color_pool():
pool = []
for cmap_name in ["tab20", "tab20b", "tab20c"]:
cmap = plt.get_cmap(cmap_name)
pool.extend([to_hex(cmap(i)) for i in range(cmap.N)])
return pool
pool = make_color_pool()
#if you have more types than pool, extend with hsv (fallback)
if len(cell_types) > len(pool):
hsv = plt.get_cmap("hsv")
extra = [to_hex(hsv(i / (len(cell_types) - len(pool)))) for i in range(len(cell_types) - len(pool))]
pool = pool + extra
# map each cell type -> color (in category order)
cell_type_colors = {ct: pool[i] for i, ct in enumerate(cell_types)}
# tell scanpy the colors in EXACT same order as categories
adata.uns["cell_type_colors"] = [cell_type_colors[ct] for ct in cell_types]
import pandas as pd
pd.DataFrame({"cell_type": cell_types, "color": adata.uns["cell_type_colors"]})
| cell_type | color | |
|---|---|---|
| 0 | B cell CD20+CD21+ | #1f77b4 |
| 1 | B cell CD20+CXCR5+ | #aec7e8 |
| 2 | CD8 T cell | #ff7f0e |
| 3 | CLEC9A+IDO-1+ | #ffbb78 |
| 4 | DC | #2ca02c |
| 5 | Endothelial cell CD34+CD31+ | #98df8a |
| 6 | Epithelial cell EGFR+betaCatenin+CD138+ | #d62728 |
| 7 | M1 Macrophage CD11c+CD68+ | #ff9896 |
| 8 | M2 Macrophage CD11B+CD163+ | #9467bd |
| 9 | M2 Macrophage CD206+ | #c5b0d5 |
| 10 | MCT+ | #8c564b |
| 11 | NK cell | #c49c94 |
| 12 | Neutrophil | #e377c2 |
| 13 | Plasma cell | #f7b6d2 |
| 14 | Treg | #7f7f7f |
| 15 | Treg CCR7+ | #c7c7c7 |
| 16 | Treg IDO-1+ | #bcbd22 |
| 17 | vessel aSMA+ | #dbdb8d |
sp.pl.catplot(
adata,
color = "cell_type", # specify group column name here (e.g. celltype_fine)
unique_region = "condition", # specify unique_regions here
X='x', Y='y', # specify x and y columns here
n_columns=2, # adjust the number of columns for plotting here (how many plots do you want in one row?)
palette=cell_type_colors, #default is None which means the color comes from the anndata.uns that matches the UMAP
savefig=False, # save figure as pdf
output_fname = "", # change it to file name you prefer when saving the figure
output_dir=output_dir, # specify output directory here (if savefig=True)
figsize= 17,
size = 20)
| x | y | cell_type | condition | |
|---|---|---|---|---|
| 1 | 1472.197452 | 5.356688 | Neutrophil | tonsillitis |
| 2 | 1505.800000 | 5.072727 | Neutrophil | tonsillitis |
| 5 | 1485.843023 | 9.220930 | Neutrophil | tonsillitis |
| 8 | 1518.109589 | 9.616438 | Neutrophil | tonsillitis |
| 9 | 1582.630252 | 11.563025 | Neutrophil | tonsillitis |
| ... | ... | ... | ... | ... |
| 22253 | 1313.725191 | 2521.114504 | Plasma cell | tonsillitis |
| 22254 | 1331.719512 | 2522.784553 | Epithelial cell EGFR+betaCatenin+CD138+ | tonsillitis |
| 22255 | 1456.914062 | 2521.546875 | Epithelial cell EGFR+betaCatenin+CD138+ | tonsillitis |
| 22258 | 1383.661972 | 2523.711268 | Epithelial cell EGFR+betaCatenin+CD138+ | tonsillitis |
| 22259 | 1420.271739 | 2524.836957 | Epithelial cell EGFR+betaCatenin+CD138+ | tonsillitis |
16038 rows × 4 columns
# cell type percentage tab and visualization [much few]
ct_perc_tab, _ = sp.pl.stacked_bar_plot(
adata = adata, # adata object to use
color = 'cell_type', # column containing the categories that are used to fill the bar plot
grouping = 'condition', # column containing a grouping variable (usually a condition or cell group)
cell_list = ['CD8 T cell', 'Treg', 'B cell CD20+CXCR5+', 'NK cell', 'B cell CD20+CD21+',], # list of cell types to plot, you can also see the entire cell types adata.obs['celltype_fine'].unique()
palette=cell_type_colors, #default is None which means the color comes from the anndata.uns that matches the UMAP
savefig=False, # change it to true if you want to save the figure
output_fname = "", # change it to file name you prefer when saving the figure
output_dir = output_dir, #output directory for the figure
norm = False, # if True, then whatever plotted will be scaled to sum of 1
fig_sizing= (6,6)
)
sp.pl.create_pie_charts(
adata,
color = "cell_type",
grouping = "condition",
show_percentages=False,
palette=cell_type_colors, #default is None which means the color comes from the anndata.uns that matches the UMAP
savefig=False, # change it to true if you want to save the figure
output_fname = "", # change it to file name you prefer when saving the figure
output_dir = output_dir #output directory for the figure
)